-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Improve ASR models' invariance to padding/batch size #13827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Checked with parakeet models as well.
| @pytest.mark.skip(reason="Used only for debugging.") | ||
| @pytest.mark.parametrize("length", [16000]) | ||
| def test_canary_invariant_to_padding(deterministic_rng, length): | ||
| model = ASRModel.from_pretrained("nvidia/canary-180m-flash").eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no pretrained :)
Signed-off-by: Piotr Żelasko <[email protected]>
…hub.com/nvidia/nemo into fix-pad-inconsistency-feature-extractor
Signed-off-by: Piotr Żelasko <[email protected]>
Signed-off-by: tango4j <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
|
Just commenting for future reference. For Sortformer, Lhotse-based inference is supported but training is not supported yet. |
Signed-off-by: taejinp <[email protected]>
Signed-off-by: Piotr Żelasko <[email protected]>
|
[🤖]: Hi @pzelasko 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
|
@pzelasko I checked the diarization unit tests. As long as it passes all unit tests and CI test, I think the change makes no issues on Sortformer diarization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
* Fix feature extractor to be invariant to padding Signed-off-by: Piotr Żelasko <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * preliminary conformer inference parity with/without padding Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix CI check Signed-off-by: Piotr Żelasko <[email protected]> * fix to cache-aware models Signed-off-by: Piotr Żelasko <[email protected]> * fix a bunch of tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests part 2 Signed-off-by: Piotr Żelasko <[email protected]> * Unit test fixes for too short feature extractor inputs Signed-off-by: Piotr Żelasko <[email protected]> * Resolved feature frame length issue in E2E diarization dataloader Signed-off-by: taejinp <[email protected]> * Apply isort and black reformatting Signed-off-by: tango4j <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * removed test_ds from YAML file since it is not used Signed-off-by: taejinp <[email protected]> * fix diarization unit tests after recent changes Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: taejinp <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Charlie Truong <[email protected]> Co-authored-by: taejinp <[email protected]> Co-authored-by: tango4j <[email protected]> Signed-off-by: Amir Hussein <[email protected]>
* Fix feature extractor to be invariant to padding Signed-off-by: Piotr Żelasko <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * preliminary conformer inference parity with/without padding Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix CI check Signed-off-by: Piotr Żelasko <[email protected]> * fix to cache-aware models Signed-off-by: Piotr Żelasko <[email protected]> * fix a bunch of tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests part 2 Signed-off-by: Piotr Żelasko <[email protected]> * Unit test fixes for too short feature extractor inputs Signed-off-by: Piotr Żelasko <[email protected]> * Resolved feature frame length issue in E2E diarization dataloader Signed-off-by: taejinp <[email protected]> * Apply isort and black reformatting Signed-off-by: tango4j <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * removed test_ds from YAML file since it is not used Signed-off-by: taejinp <[email protected]> * fix diarization unit tests after recent changes Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: taejinp <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Charlie Truong <[email protected]> Co-authored-by: taejinp <[email protected]> Co-authored-by: tango4j <[email protected]> Signed-off-by: Amir Hussein <[email protected]>
* Fix feature extractor to be invariant to padding Signed-off-by: Piotr Żelasko <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * preliminary conformer inference parity with/without padding Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix CI check Signed-off-by: Piotr Żelasko <[email protected]> * fix to cache-aware models Signed-off-by: Piotr Żelasko <[email protected]> * fix a bunch of tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests part 2 Signed-off-by: Piotr Żelasko <[email protected]> * Unit test fixes for too short feature extractor inputs Signed-off-by: Piotr Żelasko <[email protected]> * Resolved feature frame length issue in E2E diarization dataloader Signed-off-by: taejinp <[email protected]> * Apply isort and black reformatting Signed-off-by: tango4j <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * removed test_ds from YAML file since it is not used Signed-off-by: taejinp <[email protected]> * fix diarization unit tests after recent changes Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: taejinp <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Charlie Truong <[email protected]> Co-authored-by: taejinp <[email protected]> Co-authored-by: tango4j <[email protected]> Signed-off-by: Amir Hussein <[email protected]>
* Fix feature extractor to be invariant to padding Signed-off-by: Piotr Żelasko <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * preliminary conformer inference parity with/without padding Signed-off-by: Piotr Żelasko <[email protected]> * fix Signed-off-by: Piotr Żelasko <[email protected]> * fix tests Signed-off-by: Piotr Żelasko <[email protected]> * fixes Signed-off-by: Piotr Żelasko <[email protected]> * fix CI check Signed-off-by: Piotr Żelasko <[email protected]> * fix to cache-aware models Signed-off-by: Piotr Żelasko <[email protected]> * fix a bunch of tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests Signed-off-by: Piotr Żelasko <[email protected]> * Fix failing CI tests part 2 Signed-off-by: Piotr Żelasko <[email protected]> * Unit test fixes for too short feature extractor inputs Signed-off-by: Piotr Żelasko <[email protected]> * Resolved feature frame length issue in E2E diarization dataloader Signed-off-by: taejinp <[email protected]> * Apply isort and black reformatting Signed-off-by: tango4j <[email protected]> * fix ci Signed-off-by: Piotr Żelasko <[email protected]> * removed test_ds from YAML file since it is not used Signed-off-by: taejinp <[email protected]> * fix diarization unit tests after recent changes Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: taejinp <[email protected]> Signed-off-by: tango4j <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Charlie Truong <[email protected]> Co-authored-by: taejinp <[email protected]> Co-authored-by: tango4j <[email protected]>
What does this PR do ?
Adds tests and fixes inconsistency in ASR feature extractor and subsampling when processing the same input with and without padding. Specifically:
audio_length < audio.shape["time"])As a result, the models' WER outcomes vary much less with batch size, but the outcome is still not 100% identical across batch sizes. For example, for parakeet-tdt-0.6b-v2, parakeet-rnnt-1.1b, and canary-180m-flash the absolute difference between batch sizes 128 and 512 was 0.01% WER.
Comparison of all NVIDIA NeMo ASR models on Open ASR Leaderboard (offline only):
I also checked the results on NVTalks for one cache-aware model:
Collection: ASR
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information